Add audio diagnostic agent to detect audio classification mismatches in videos #66

Copilot · 2025-11-06T19:40:44Z

Implements a Python CLI agent to automatically audit audio in videos, detecting potential misclassifications (e.g., "snore" detected when audio contains "bark") by analyzing spectrograms and frequency band energies.

Core Implementation

CLI Tool (scripts/audio_diagnostic_agent.py, 510 LOC)

Processes single videos or directories
Generates JSON reports with sample rate analysis, spectrogram parameters, frequency band energies, and top-k predictions
Flags suspicious files based on sample rate mismatches or large energy variations across bands (>10 dB threshold)
Attempts project inference with fallback to labels.txt or ESC-50 classes

Audio Utilities (scripts/utils_audio.py, 232 LOC)

get_sample_rate() - ffprobe integration
extract_audio_wav() - ffmpeg extraction preserving sample rate
compute_mel_spectrogram() - librosa with configurable n_fft/hop_length/n_mels/fmin/fmax
measure_energy_in_band() - frequency range energy analysis
save_spectrogram_image() - matplotlib PNG generation

Configuration

CLI accepts parameters via arguments or YAML config:

n_fft: 2048
hop_length: 512
n_mels: 128
frequency_bands:
  bark: [150, 2000]
  snore: [50, 300]
  chirp: [2000, 8000]
thresholds:
  energy_diff_threshold: 10.0
  sample_rate_mismatch: true

Report Output

{
  "video_path": "video.mp4",
  "original_sample_rate": 44100,
  "used_sample_rate": 44100,
  "spectrogram_path": "reports/video_spectrogram.png",
  "frequency_band_energies": {
    "bark": -45.2,
    "snore": -65.1
  },
  "suspicion": true,
  "suspicion_reasons": ["Large energy variation: 28.9 dB"]
}

Testing & CI

10 unit tests covering critical functions with synthetic audio generation
GitHub Actions workflow: matrix testing (Python 3.8-3.12), smoke test, linting
CodeQL scan: 0 vulnerabilities

Usage

# Single file
python audio_diagnostic_agent.py --input video.mp4

# Directory with custom params
python audio_diagnostic_agent.py --input videos/ --n-mels 256 --threshold 15.0

# YAML config
python audio_diagnostic_agent.py --input videos/ --config config.yaml

Dependencies: librosa, soundfile, matplotlib, pyyaml (added to requirements-dev.txt)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

esm.ubuntu.com
- Triggering command: /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

Créer un agent d'audit audio (Python CLI) pour détecter automatiquement les problèmes de détection audio (ex: détection erronée de "snote" alors que l'audio contient un "bark"). L'agent doit :

Parcourir un dossier de vidéos ou un fichier vidéo donné.

Pour chaque vidéo :

Récupérer le sample rate original via ffprobe.

Extraire l'audio en WAV en préservant le sample rate original (ffmpeg).

Charger l'audio (librosa/soundfile) en conservant le sample rate.

Calculer un spectrogramme (Mel spectrogram) avec paramètres configurables (n_fft, hop_length, n_mels, fmin, fmax).

Mesurer l'énergie dans des bandes de fréquence pertinentes (ex: bande "bark" vs bande "snote").

Tenter d'appeler la fonction d'inférence du projet (si disponible) pour obtenir top-k prédictions ; sinon, exécuter un fallback qui lit labels.txt si présent.

Générer un rapport JSON par fichier (original_sr, used_sr, spectrogram_path, top_predictions, suspicion: true/false et motifs).

Sauvegarder spectrogrammes PNG et un rapport global (reports/report_TIMESTAMP.json).

Fournir une API CLI avec options : --input (file|dir), --outdir, --model (chemin ou nom), --topk, --config (yaml), --thresholds.

Inclure des tests unitaires pour les fonctions critiques (get_sample_rate, compute_mel_spectrogram, energy_in_band).

Ajouter un workflow GitHub Action qui exécute les tests et lance l'agent en mode smoke-test sur un petit exemple si des fichiers de test audio existent.

Livrables (fichiers à ajouter/modifier) :

scripts/audio_diagnostic_agent.py : script principal CLI.

scripts/utils_audio.py : fonctions utilitaires (ffprobe, ffmpeg extract, spectrogram, energy band).

tests/test_audio_agent.py : tests unitaires.

.github/workflows/audio-diagnostics.yml : CI pour tests et exécution smoke.

requirements-dev.txt : dépendances nécessaires (librosa, soundfile, matplotlib, pytest).

Comportement attendu :

Le PR ajoute l'agent et la CI. Le code est compatible Python 3.8+ et documenté. Les chemins et paramètres sont configurables via un fichier YAML ou args CLI. Le rapport doit permettre à un développeur de reproduire pourquoi une prédiction a été faite (sample rate mismatch, spectrogram params, énergie dans bandes)."

This pull request was created as a result of the following prompt from Copilot chat.

Créer un agent d'audit audio (Python CLI) pour détecter automatiquement les problèmes de détection audio (ex: détection erronée de "snote" alors que l'audio contient un "bark"). L'agent doit :

Parcourir un dossier de vidéos ou un fichier vidéo donné.

Pour chaque vidéo :

Récupérer le sample rate original via ffprobe.

Extraire l'audio en WAV en préservant le sample rate original (ffmpeg).

Charger l'audio (librosa/soundfile) en conservant le sample rate.

Calculer un spectrogramme (Mel spectrogram) avec paramètres configurables (n_fft, hop_length, n_mels, fmin, fmax).

Mesurer l'énergie dans des bandes de fréquence pertinentes (ex: bande "bark" vs bande "snote").

Tenter d'appeler la fonction d'inférence du projet (si disponible) pour obtenir top-k prédictions ; sinon, exécuter un fallback qui lit labels.txt si présent.

Générer un rapport JSON par fichier (original_sr, used_sr, spectrogram_path, top_predictions, suspicion: true/false et motifs).

Sauvegarder spectrogrammes PNG et un rapport global (reports/report_TIMESTAMP.json).

Fournir une API CLI avec options : --input (file|dir), --outdir, --model (chemin ou nom), --topk, --config (yaml), --thresholds.

Inclure des tests unitaires pour les fonctions critiques (get_sample_rate, compute_mel_spectrogram, energy_in_band).

Ajouter un workflow GitHub Action qui exécute les tests et lance l'agent en mode smoke-test sur un petit exemple si des fichiers de test audio existent.

Livrables (fichiers à ajouter/modifier) :

scripts/audio_diagnostic_agent.py : script principal CLI.

scripts/utils_audio.py : fonctions utilitaires (ffprobe, ffmpeg extract, spectrogram, energy band).

tests/test_audio_agent.py : tests unitaires.

.github/workflows/audio-diagnostics.yml : CI pour tests et exécution smoke.

requirements-dev.txt : dépendances nécessaires (librosa, soundfile, matplotlib, pytest).

Comportement attendu :

Le PR ajoute l'agent et la CI. Le code est compatible Python 3.8+ et documenté. Les chemins et paramètres sont configurables via un fichier YAML ou args CLI. Le rapport doit permettre à un développeur de reproduire pourquoi une prédiction a été faite (sample rate mismatch, spectrogram params, énergie dans bandes)."

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Created scripts/audio_diagnostic_agent.py: Main CLI tool with full argument parsing - Created scripts/utils_audio.py: Audio utilities (ffprobe, ffmpeg, spectrogram, energy bands) - Created tests/test_audio_agent.py: Comprehensive unit tests for critical functions - Created .github/workflows/audio-diagnostics.yml: CI workflow with tests and smoke test - Updated requirements-dev.txt: Added librosa, soundfile, matplotlib, pyyaml, pytest - Updated .gitignore: Exclude reports/ and temporary WAV files - Added scripts/README.md: Complete documentation with examples - Added scripts/config_example.yaml: Example configuration file Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>

- Add 60s timeout to subprocess.run in extract_audio_wav to prevent hanging - Move soundfile import to top of test file following Python conventions - Remove duplicate soundfile imports within test functions - Add implementation summary document Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>

- Add explicit 'contents: read' permissions to all workflow jobs - Resolves CodeQL security alerts for missing workflow permissions - Follows security best practice of least privilege Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>

Initial plan

f3ca376

Copilot AI assigned Copilot and hackolite Nov 6, 2025

Copilot started work on behalf of hackolite November 6, 2025 19:41 View session

Copilot AI and others added 3 commits November 6, 2025 19:52

Copilot AI changed the title ~~[WIP] Add audio audit agent for automatic detection issues~~ Add audio diagnostic agent to detect audio classification mismatches in videos Nov 6, 2025

Copilot finished work on behalf of hackolite November 6, 2025 20:05

Copilot AI requested a review from hackolite November 6, 2025 20:05

hackolite force-pushed the main branch 2 times, most recently from d9f4029 to f5ce349 Compare November 9, 2025 10:29

hackolite force-pushed the main branch from 4c945df to f581acf Compare December 21, 2025 20:25

hackolite force-pushed the main branch from 5c355e1 to 3215af1 Compare January 2, 2026 02:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add audio diagnostic agent to detect audio classification mismatches in videos #66

Add audio diagnostic agent to detect audio classification mismatches in videos #66

Uh oh!

Copilot AI commented Nov 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add audio diagnostic agent to detect audio classification mismatches in videos #66

Are you sure you want to change the base?

Add audio diagnostic agent to detect audio classification mismatches in videos #66

Uh oh!

Conversation

Copilot AI commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Core Implementation

Configuration

Report Output

Testing & CI

Usage

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 6, 2025 •

edited

Loading